107 research outputs found

    Frequency domain analysis of MFCC feature extraction in children’s speech recognition system

    Get PDF
    Abstract —The research on speech recognition systems currently focuses on the analysis of robust speech recognition systems. When the speech signals are combined with noise, the recognition system becomes distracted, struggling to identify the speech sounds. Therefore, the development of a robust speech recognition system continues to be carried out. The principle of a robust speech recognition system is to eliminate noise from the speech signals and restore the original information signals. In this paper, researchers conducted a frequency domain analysis on one stage of the Mel Frequency Cepstral Coefficients (MFCC) process, the Fast Fourier Transform (FFT), in children's speech recognition system. The FTT analysis in the feature extraction process determined the effect of frequency value characteristics utilized in the FFT output on the noise disruption. The analysis method was designed into three scenarios based on the value of the employed FFT points. The differences between scenarios were based on the number of shared FFT points. All FFT points were divided into four, three, and two parts in the first, second, and third scenarios, respectively. This study utilized children's speech data from the isolated TIDIGIT English digit corpus. As comparative data, the noise was added manually to simulate real-world conditions. The results showed that using a particular frequency portion following the scenario designed on MFCC affected the recognition system performance, which was relatively significant on the noisy speech data. The designed method in the scenario 3 (C1) version generated the highest accuracy, exceeded the accuracy of the conventional MFCC method. The average accuracy in the scenario 3 (C1) method increased by 1% more than all the tested noise types. Using various noise intensity values (SNR), the testing process indicates that scenario 3 (C1) generates a higher accuracy than conventional MFCC in all tested SNR values. It proves that the selection of specific frequency utilized in MFCC feature extraction significantly affects the recognition accuracy in a noisy speech

    Object Recognition of Monochromatic Images In The Frequency Domain

    Get PDF
    A large image usually consists of several smaller objects. People can recognize the objects automatically. The objects can be differented because they have different patterns. The aim of this research is for computer to recognize an object in image. The objects which will be recognized are transformed to the frequency domain, so spectrum frequencies are obtained for patterns, and these spectra are used as input. The objects are sampled by 4x4, 8x8, and 16x16 pixels. The object recognition uses an Artificial Neural Network method with step function. From this research, it is found that pattern recognition by spectrum frequency inputs is resistant to changes in position, such as rotation, translation, and reflection. Thus the same object is still recognized well, even though it has different position, or different place. Our experiment results show that pattern recognition in the frequency domain is more resistant to changes in position changing than pattern recognition in the spatial domain

    Privacy preserving pattern matching : implementation issues

    Get PDF
    Masteroppgave i informasjons- og kommunikasjonsteknologi 2002 - Høgskolen i Agder, GrimstadThe growing of the Internet has the potential to erode personal privacies. One can monitor a private data through the Internet without data owner knowledge. The project is to implement technique that allow pattern matching on user data and still preserve user privacy. Such technique gives possibility to place data on insecure third party site and be able to perform matching without revealing information about either data or pattern. The system uses public key cryptography to protect data on third part insecure site and to control revealing of data. The data owner participate in data decryption together with permitted data user

    Wavelet Based Feature Extraction for The Indonesian CV Syllables Sound

    Get PDF
    This paper proposes the combined methods of Wavelet Transform (WT) and Euclidean Distance (ED) to estimate the expected value of the possibly feature vector of Indonesian syllables. This research aims to find the best properties in effectiveness and efficiency on performing feature extraction of each syllable sound to be applied in the speech recognition systems. This proposed approach which is the state-of-the-art of the previous study consist of three main phase. In the first phase, the speech signal is segmented and normalized. In the second phase, the signal is transformed into frequency domain by using the WT. In the third phase, to estimate the expected feature vector, the ED algorithm is used. Th e result shows the list of features of each syllables can be used for the next research, and some recommendations on the most effective and efficient WT to be used in performing syllable sound recognition

    Real-Time Indonesian Language Speech Recognition with MFCC Algorithms and Python-Based SVM

    Get PDF
    Abstract — Automatic Speech Recognition (ASR) is a technology that uses machines to process and recognize human voice. One way to increase recognition rate is to use a model of language you want to recognize. In this paper, a speech recognition application is introduced to recognize words "atas" (up), "bawah" (down), "kanan" (right), and "kiri" (left). This research used 400 samples of speech data, 75 samples from each word for training data and 25 samples for each word for test data. This speech recognition system was designed using Mel Frequency Cepstral Coefficient (MFCC) as many as 13 coefficients as features and Support Vector Machine (SVM) as identifiers. The system was tested with linear kernels and RBF, various cost values, and three sample sizes (n = 25, 75, 50). The best average accuracy value was obtained from SVM using linear kernels, a cost value of 100 and a data set consisted of 75 samples from each class. During the training phase, the system showed a f1-score (trade-off value between precision and recall) of 80% for the word "atas", 86% for the word "bawah", 81% for the word "kanan", and 100% for the word "kiri". Whereas by using 25 new samples per class for system testing phase, the f1-score was 76% for the "atas" class, 54% for the "bawah" class, 44% for the "kanan" class, and 100% for the "kiri" class

    Comparison of multi-distance signal level difference Hjorth descriptor and its variations for lung sound classifications

    Get PDF
    A biological signal has the multi-scale and signals complexity properties. Many studies have used the signal complexity calculation methods and multi-scale analysis to analyze the biological signal, such as lung sound. Signal complexity methods used in the biological signal analysis include entropy, fractal analysis, and Hjorth descriptor. Meanwhile, the commonly used multi-scale methods include wavelet analysis, coarse-grained procedure, and empirical mode decomposition (EMD). One of the multi-scale methods in the biological signal analysis is the multi-distance signal level difference (MSLD), which calculates a difference between two signal samples at a specific distance. In previous studies, MSLD was combined with Hjorth descriptor for lung sound classification. MSLD has the potential to be developed by modifying the fundamental equation of MSLD. This study presents the comparison of MSLD and its variations combined with Hjorth descriptor for lung sound classification. The results showed that MSLD and its variations had the highest accuracy of 98.99% for five lung sound data classes. The results of this study provided several alternatives for multi-scale signal complexity analysis method for biological signals

    Analysis of the Indonesian Vowel /e/ For Lip Synchronization Animation

    Get PDF
    Currently, voice recognition technology is widely used to produce lip sync animation. Vowels take the most dominant roles for lip sync animation as it always exists in every syllable. Therefore, it is necessary to select appropriate vowel traits for the system to be accurate. In general, there are five vowels of Indonesian language, namely /a/ /i/ /u/ /e/ and /o/. However, there are two vowels that contain several different tones: /o/ that are pronounced /o/ and /O/, and /e/ that are pronounced /e/, /ǝ/, and /ɛ/. The difference in tone can affect the accuracy of voice recognition on the lip sync animation system if it is not specified further. In this paper, the characteristic values of vowel /e/, /ǝ/, and /ɛ/ are compared and analyzed to find the significance of the difference. The sought characteristic values are the frequency of the formant (F1, F2, and F3) through the Praat software used to extract the features. Comparison is done using a statistical test of t-test. The results show that the three vowel tones /e/ have significant differences for all of F1 and most of F

    Improving Phoneme to Viseme Mapping for Indonesian Language

    Get PDF
    The lip synchronization technology of animation can run automatically through the phoneme-to-viseme map. Since the complexity of facial muscles causes the shape of the mouth to vary greatly, phoneme-to-viseme mapping always has challenging problems. One of them is the allophone vowel problem. The resemblance makes many researchers clustering them into one class. This paper discusses the certainty of allophone vowels as a variable of the phoneme-to-viseme map. Vowel allophones pre-processing as a proposed method is carried out through formant frequency feature extraction methods and then compared by t-test to find out the significance of the difference. The results of pre-processing are then used to reference the initial data when building phoneme-to-viseme maps. This research was conducted on maps and allophones of the Indonesian language. Maps that have been built are then compared with other maps using the HMM method in the value of word correctness and accuracy. The results show that viseme mapping preceded by allophonic pre-processing makes map performance more accurate when compared to other maps

    An Experimental Study of Conducted EMI Mitigation on the LED Driver using Spread Spectrum Technique

    Get PDF
    LED driver has the potential to interfere the system of electronic devices if the voltage and current change rapidly.  Several previous studies presented various solutions to overcome this problem such as particular converter design, component design, electromagnetic interference (EMI) filters, and spread-spectrum techniques. Compared to other solutions, the spread-spectrum technique is the most potential way to reduce the EMI in LED applications due to its limited cost-size-weight. In this paper, the effectiveness of conducted EMI suppression performance and the evaluation of its effect on LED luminance using spread-spectrum techniques are investigated. Spread-spectrum is applied to the system by modifying the switching frequency by providing disturbances at pin IADJ. The disorder is given in the form of four signals, namely square, filtered-square, triangular, and sine disturbance signals. The highest level of the EMI suppression of about 31.89% is achieved when the LED driver is given 800 mVpp filtered-square waveform. The highest reduction power level occurs at fundamental frequency reference, when it is given 700 mVpp square disruption signal, is about 81.77% reduction. The LED luminance level will reduce by 85.2% when it is given the four waveforms disruption signals.  These reductions occur as the switching frequency of the LED driver does not work on a fixed frequency, but it varies in certain bands. LED brightness level has a tendency to generate a constant value of 235 lux when it is given the disruption signals. This disturbance signal causes the dimming function on the system that does not work properly

    The Effects of Spread-Spectrum Techniques in Mitigating Conducted EMI to LED Luminance

    Get PDF
    Rapid voltage and current changes in recently ubiquitous LED driver have a potency to interfere other devices. Solutions with special converter design, component design, EMI filter, and spread-spectrum techniques have been proposed. Due to cost-size-weight constraints, spread-spectrum technique seems a potential candidate in alleviating EMI problem in LED application. In this paper, the effectiveness of conducted EMI suppression performance of the spread-spectrum technique is evaluated. Spread-spectrum techniques applied by giving disturbance to the system LED driver with 3 profile signals, filtered square, triangular, and sine disturbance signal to the switching pattern of a buck LED driver. From the test results, 472.5 kHz triangular and 525 kHz sine signal can reduce EMI about 42 dBuV whilethe filtered square signal can reduce EMI 40.70 dBuV compare with fundamental constantfrequency reference 669 kHz. The average reduction in the power level of the third signal inthe frequency range of 199 kHz to 925 kHz for 5.154281 dBuV and the filtered square signal can reduce the average power level better than other signal disturbance of 5.852618 dBuV.LED luminance decrease when the spread-spectrum technique is applied to the system about 2814 lux
    corecore